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Abstract 

The degrees are a classical and relevant way to study the topology of a network. 
They can be used to assess the goodness-of-fit for a given random graph model. In 
this paper we introduce goodness-of-fit tests for two classes of models. First, we 
consider the case of independent graph models such as the heterogeneous Erdos- 
Renyi model in which the edges have different connection probabilities. Second, we 
consider a generic model for exchangeable random graphs called the IT-graph. The 
stochastic block model and the expected degree distribution model fall within this 
framework. We prove the asymptotic normality of the degree mean square under 
these independent and exchangeable models and derive formal tests. We study the 
power of the proposed tests and we prove the asymptotic normality under specific 
sparsity regimes. The tests are illustrated on real networks from social sciences and 
ecology, and their performances are assessed via a simulation study. 

Keywords: random graphs; graphon; goodness-of-fit; degree variance; IT-graph. 


1 Introduction 

Interaction networks are used in many fields such as biology, sociology, ecology, economics 
or energy to describe the interactions existing between a set of individuals or entities. 
Formally, an interaction network can be viewed as a graph, the nodes of which being 
the individuals, and an edge between two nodes being present if these two individuals 
interact. Characterizing the general organization of such a network, namely its topology, 
can help in understanding the behavior of the system as a whole. 

In the last decades, the distribution of the degrees (i.e. the number of connections of 
each node) has appeared as a simple and relevant way to study the topology of a network 
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(Snijders 

1981 

Barabasi and Albert 

1999 

). The degree distribution can also be used to 

infer complex graph models ( 

Bickel et al. , 

2011 

). From a more descriptive view-point, a 


very imbalanced distribution may reveal a network whose edges highly concentrate around 
few nodes, whereas a multi-modal distribution may reveal the existence of clusters of 
nodes (Channarond et al. 2012). However, in practice, assessing the significance of such 


patterns remains an open problem. 

The variance of the degrees has been considered since the earliest statistical studies of 
networks (Snijders , 1981). The first idea was simply to compare its empirical value to the 


expected one under a null random graph model, typically the Erdos-Renyi (ER) model 
(Erdos and Renyi 1959), where each degree has a binomial distribution. Because the 


ER model is rarely a reasonable model to be tested, we define a generalized version of 
the degree variance statistic, which we name the degree sum of squares. This statistic 
generalizes the degree variance in the sense that it measures the discrepancy between the 
observed degrees and their expected values under several heterogeneous models we define 
hereafter. 

For a given random graph under a specific model Mo, the degree mean square statistic is 
defined by 

w Mo = 1 Yjd, - g) 2 , 


n 


i 


where D, stands for the degree of node i and p% for its expected value under model 
M 0 . We propose goodness-of-fit tests for several random graph models, by showing the 
asymptotic normality of this statistic Wm 0 under null hypothesis and their alternatives. In 
addition, because large networks are often sparse, we study under which sparsity regime 
the asymptotic distributions derived before still hold. 

The notations and the main models considered are the following. We consider an 
undirected graph Q = ({1,... n},£) with no self loop, that is the connection of a node 
to itself, and denote Y the corresponding n x n adjacency matrix. Thus, the entry Y tJ 
of Y is 1 if (i,j) € £, and 0 otherwise. Because Q is undirected with no self loop, we 
have Yij = Y jtl Vi ^ j and Y tl = 0, for all V s. We further denote D, the degree of node i: 

A = E 

In terms of random graph models, we consider two cases : the independent case and the 
exchangeable one. 

In the independent case, ER(p) refers to the Erdos-Renyi model, according to which all 
edges (Y^) are independent Bernoulli variables with same probability p to exist. HER( p) 
stands for the heterogeneous Erdos-Renyi model where edges are independent with re¬ 
spective probability p l} to exist. The n x n matrix p has entries Pij , it is symmetric with 
null diagonal. 

In the exchangeable case, we consider a generic model for exchangeable random graphs 

(2006); 


called the W-graph Lovasz and Szegedy 


Diaconis and Janson (2008). It is 


based on a graphon function <h : [0, l] 2 H> [0,1] and denoted by EG($>). An unobserved 
coordinate Ui ~ li[ 0,1] is associated with each node i(l < i < n ) and edges are drawn in¬ 
dependently conditional the Ui s as Y l j\U l , Uj ~ t3[$(Ui, Uj )]. The stochastic block model 
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(SBM) Holland and Leinhardt 

1979) 

Nowicki and Snijders 

(2001 

) and the expected 

degree distribution (EDD) mode' 

Chung and Lu 

(2002 

) fall within this framework. 


Goodness-of-fit tests of the models we consider have received little attention until 


now. 


Cerqueira et al. (2015) propose a goodness-of-fit test for the HER{ p) model when 


independent and identically distributed (i.i.d.) copies of the graph are available. More 


recently, Lei (2016) and Bickel and Sarkar (2016) derived goodness-of-fit tests for the 


number of communities in stochastic block models by showing the asymptotic behavior 
of the largest singular value of a residual adjacency matrix. Their respective null models 


Bickel and Sarkar 

(2016 

) and an SBM with K communities in 

Lei 

(2016) 


Yang et al. (2014) propose a test statistic for the goodness-of-fit of a given graphon 


function and use a Monte-Carlo sampling to approximate its null distribution. 

The paper is organized as follows. Section 2 is devoted to independent graph models 
and Section 3 to the the exchangeable ones. The performances of the proposed tests are 
assessed via a simulation study in Section 4. 

More specifically, the asymptotic distribution of the degree mean square statistic under 


models HER( p) and EG(3>) is derived Sections 2.1 and 3.1, respectively. The asymptotic 
normality under some specific sparsity regimes is studied in Sections |2.3 and T3 


In Section |2.2 
HER( p° 


we establish a test for the null hypothesis stating that Q arises from 
and give its power. The last part of this section is devoted to the illustration 


of the HER goodness-of-fit test on some examples. 


In the same manner, Section 3.2 deals with the EG model and its extensions, meaning 
the SBM and EDD model. 


2 Independent random graph models 

We consider the heterogeneous Erdos-Renyi model HER( p), in which the edges are in¬ 
dependent and have different respective probabilities to exist : Y t j BM. 

The asymptotic framework in the non-sparse setting is the following. We consider an 
infinite matrix P r the elements of which are all bounded away from both 0 and 1. For 
the HER model, then we build a sequence of matrices p" made of the first n rows and 
columns of P. Finally, we consider a sequence of independent graphs Q n = ({1,... n}, S n ), 
with increasing size n and respective probability matrices p". The sequence of matrices 
p*,n _ | p*\ n ] used in the sparse setting is constructed in a related way, based on an infinite 
matrix P* with all terms bounded away from 0 and 1. All quantities computed on Q n 
should therefore be indexed by n as well. For the sake of clarity, we will drop the index 
n in the rest of the paper. 

2.1 Asymptotic normality 

We consider a goodness-of-fit test for the HER( p°) model. For a given random graph 
with a matrix p° of connection probabilities, we consider the following degree mean square 
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statistic : 


K > = - E® - 4 1 ) 2 , 

n z —' 

i 

where Di = an d A 4 ? stand for the expected degree of node i under HER( p°), 

namely $ = Y, j ±iP%- 

We establish the asymptotic normality of W p o under model HER( p). The proof 
relies on projections of W p o on suitable spaces and the Lindeberg-Levy Theorem (see e.g. 


(Billingsley , 1968), Theorem 7.2, p.42) which is recalled below. We derive all projections 


involved in the Hoeffding decomposition (see, e.g., Chapter 11 in ( 

van der Vaart 

1998)) 

to easily calculate the moments of 1L p o. As for the asymptotic normality, we decompose 

W p o into the sum of its Hajek projection (see, e.g., Chapter 11 in 

van der Vaart 

1998)) 


to which we apply the Lindeberg-Levy Theorem, and a negligible term. A similar strategy 


has already been used for graph studies, for instance in (Bloznclis 2005) to prove the 


asymptotic normality of the variance degree under model ER(p) and in (Nowicki and 


Wierman 1988) to prove the one of subgraph counts in random graphs. 


Theorem 1 (Lindeberg-Levy) Let (W™)i<«<fc ri , be a triangular array of independent 
random variables with means 0 and finite variances (& 2 u )i< u <k n - LetBl = Ztirtu- U 
the Lindeberg condition 


A^(e)/B 2 —y 0, as n —* oo, for each e > 0, where A^(e) = ^ 


x lu dP (!) 


— ^ J {| %nu | } 


is satisfied then 


-| fan 

AV(0,1). 

11 M=1 

Remark 1 Let consider the case of binary random variables X nu with mean 0. More 
specifically, set X nu = a nu Z nu , a nu G M, where Z nu are centered Bernoulli variables, that 
is to say Z nu takes value 1 — p nu with probability p nu and value —p nu with probability 
1 — Pnu- Because \X nu \ < a nu , the realization of the event \X nu \ > eB n in the definition 
in (|TJ) is controlled by |a nu | > eB n . Therefore, all X nu for which \a nu \ < eB n do 
not contribute to A 2 (e). If this holds for all X nu , then the Lindeberg condition is directly 
satisfied. If not, only the X nu for which it does not hold have to be considered in the 
calculation of A^(e) and, because \Z nu \ < 1, their contribution is upper-bounded by their 
variance a 2 u = af ny p nu (1 — p nu ). In the forthcoming theorems proofs, we will verify the 
Lindeberg condition using this observation. 

Theorem 2 Under model HER( p) , the statistic W p o is asymptotically normal: 

(R> — E HER(p)W p o)/E>HER(p)W p o -» A/"(0, 1), 
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where § denotes the standard deviation and 


^HER( p)Wp° n (Si <i<j<n^ a ij "b &ij^jk + ^ik^jk^j ■ 

where cr 2 - = pjj(l — Pij) and Sij = ptj — p°-. Moreover 

Vf/ER( P )Wpo = —f <7 2 (1 - 2p ? ;j + Aj + Aj) 2 

+ X/ ( a ij a ik + tffjC’jfc + a ik a jk) ) > 

l<i<j><A;<n 

with A i = 

Proof. Let begin with the calculation of IbpO moments. We first observe that, 

?rw p o = yjp,, - pi+Pi- p°i) 2 = y ( % + s tJ 

i i \j^i 

= 2 E Y + S Y 

+2 (Lij + Sij)(Y ik + 6ik) + (Yij + 5ij){Yjk + <5jfc) + (Xik + $ik){Yjk + <5jfc)> 

l<i< - 7 <fc<n 

where Yjj = Y VJ — p t] . Then, we write the Hoeffding decomposition of WpO : 

WpO = PffiWpO + P{ij}W p o + {P{ij,ik}W p o + P{ijjk}W pO + P{ik,kj}W p o) , (2) 

1 <i<j<n 1 <i<j<k<n 

where 

Pa \T p° = Ell pO, 

P { ,, } W p o = E(VFpo|Yy) — EWpO, 

P{ij,ik}W p° = E(W p o| Yij,Y ik ) — E(W pO y) — E(ffpO \Y ik ) + EW p o. 

Combining the definitions above with the expression ([2]) of WpO, we obtain that, 

2 _ 2 _> 

P %fLpO ^ ^ "b ^ij ) "b ^ ^ dijSik T SijSjk T dik : 6jk- 

l<i<j<n 1 <i<j<k<n 

P{ij}W p o = —(1 + A ? ; + Aj) — (jj 2 j = -Yij (1 — 2pij + (Aj + Aj)), (3) 

2- 

-PpjjfcjWpO = -YijY ik . (4) 

Observe now that, 

nEWpO = 2 (crb + 5 2 ) + 2 5j j5*fc + + dik^jk- 

1 <i<j<n 1 <i<j<k<n 
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Because the Y t j are independent with zero mean, the projections are all orthogonal with 
each other, which gives 

n 2 W p o = n 2 X Y(P {ij} W p0 ) 

l<i<j<n 

+n 2 X (Y(P{ij,ik}W p o) + Y(P{ijj k }W p o) + V(Pp fcj - fc }Wpo)) 

l<i<j<k<n 

= 4 X a ij( 1 ~ 2 Pv + A * + A j ) 2 + 4 XI { a ij a ik + a ijtfk + v 2 ik° 2 jk) ■ 

1 <i<j<n l<i<j<k<n 

We now turn to the asymptotic normality of WpO. Let decompose W p o as follows. 

Wpo - ew p o = w; o - EWpo + Wpo - w; 0 , 

where W* 0 = P$W p o + ^ ]<i<j<n P{ij}W p o is the Hajek projection of W p o, which cor¬ 
responds to the first two terms of the Hoeffding’s decomposition. We will show that 
W* 0 — EWpO is asymptotically normal and that W p o — W* 0 is a negligible term. 

Let consider W* 0 — Elh p o = Ei<i<i<n^{ii}^p° and apply Theorem [I] to the projections 
P{ij}W p o which stand for the X nu . We hrst observe that these projections are each pro¬ 
portional to the Yjj which are all independent centered Bernoulli variables. We may now 
use Remark [lj We denote the a nu by a n {ij }, the explicit expression of which is given in 

(|3]). We observe that a n = 0(1) and B 2 = V (w* 0 — EhL p oj = 0(n 2 ). It implies that 
the Lindeberg condition is fulfilled because, for any e, each a nu becomes smaller than eB n 
when n goes to infinity. Now by considering (J2]) the Hoeffding decomposition of W p o, we 
see that 

11 p o — W*o = X {P{ij,ik}W p o + P{ij,jk}W pO + P{ik,kj}W p o) . 

l<i<j<k<n 

Then we observe that a^j^y given in Q is 0(n _1 ) and therefore that V ^ITpO — W^ 0 j = 
0(n). We conclude to the asymptotic normality of WpO by combining the one of W* 0 — 
ElLpO and the fact that V (\V P o — W* 0 j /YW* 0 —* 0 as n —>• oo.B 


Degree variance test 

We consider the following statistic which is the empirical degree variance for the test of 
H 0 = ER versus Hi = HER{ p). 

v = ^EE-o) 2 , 

l 

where D = (1 /n) JT Dj- 

The variance of the degrees has been naturally considered earlier in statistical studies of 
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networks. Hagberg (Hagberg 2003) derives the exact moments of the degree variance 


and suggests to use a Gamma distribution (Hagberg 2000). Snijders (Snijders , 1981) 


also gives the first two moments of the degree variance, but conditionally to the total 
number of edges. To our knowledge the first and only proof of the asymptotic normality 


of the degree variance under the ER model is given in a technical report from (Bloznelis 


2005). Here, we establish the asymptotic normality of V under model HER( p) and 


obtain the ER version as a consequence. 

Corollary 1 Under model HER( p), the degree variance is asymptotically normal: 

(v - E H er(p)V) /S her{p) V A JV(0,1), 


with 


E 


HER( p) 


V = 


2(n - 2) 


n 


\ ' 2 (n — 4) \ ^ , , 

/ Pij T / J \PijPik T PijPjk T PikPjk f 




l<i<j<k<n 


n z 


^ ^ {PijPki PikPji PiiPjk} i 


l<i<j <k<l<n 


and 


V 


HER{ p) 


V = 


4n 4 


a l a ~ 2 )+ a - 4 ) a + Pj,k) - 16 


Pkl 


1 




k<l^(i,j) 


+ ~ A ^ { a l a ik + a ij a jk + a ik a %} 


rr 


l<i<j<k<n 


64 ( G % G U + OikOjl + } ■ 


l<i<j <k<l<n 


The proof follows the line of this of Theorem [2] and is given in Appendix |A.1[ 

Note that the asymptotic normality of the degree variance under model ER{p) is a 
straightforward application of Corollary [l] to the case where all p tJ are equal to p. We 
have, 


{V - E er[p) V) /E>er( p )V A A7(0,1), 

where E ER ( p )V = n -1 (n — l)(n — 2 )pq and Y ER ( P )V = n~ 3 2(n — 1 )(n — 2) 2 pq (1 + (n — 6 )pq), 
as given in 


Hagberg (2000). 


2.2 Test and power 

We now study the test of H 0 = HER( p°) versus H i = HER( p). The next Corollaries 
provide the null distribution of the test statistic VE p o and the power of the associate test. 
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Corollary 2 Under model HER{ p°) the statistic W p o is asymptotically normal with mo¬ 
ments: 

2 . 

E HER(pO)W p o = - a ij > 

V H ER( P 0)W P 0 = | 4 ^ aj(l - 2pij) 2 + (<4<4 + <4^ + 

This is a direct consequence of Theorem [ 2 ] in the special case of the HER{ p°) model 
for which all Sif s are zero (<5^ = pij — p°). 

A formal test with asymptotic level a can be constructed based on Corollary [2j which 
rejects H 0 as soon as ITpO exceeds E her( p 0 )W p o + t a E> HE R( p o)W p o, where t a stands for the 
1 — a quantile of the standard Gaussian distribution. The power of this test is given by 
the following Corollary. 

Corollary 3 The asymptotic power of the test for Ho = HER( p°) versus H 1 = HER( p) 
is 



^(p) — 1 — $ ( (lE_f/_E_R( p O) Wpt) + t a E>HER{p°)W pO — ^HER(p)W p a) /$HER{p)^ p° ) , 

where <f> stands for the cumulative distribution function [cdf] of the standard normal dis¬ 
tribution and t a = $ -1 (l — a). 

Degree variance test 

We now consider the use of the statistic V for the test of H 0 = ER versus H x = HER( p). 
Because the probability is unknown in practice, we consider the following test statistic 
using a plug-in version of the moments, namely 


[V — E er(p)V) /E>er(p)V, 


where p = [n(n - 1)] ; 

The asymptotic power 7r(p) = F P {C > t a } of the considered test, with nominal level 
a > 0, is 


7r(p) — 1 — ((E ER(p)V + ta^ERipy — E HER(p)V ) /^HER(p) V ) , 

where p = [n(n — 1 This results from the asymptotic normality of (V — 

E ER(py)/^ERipy under the HER( p) model. Actually, the asymptotic distribution of 
the test based on (1/ — E er(p)V)/E>er(p)V is the same as the one of the test based on the 
statistic (V — E er(p)V)/^>er{p)V (see Lemma [ 2 ] in Appendix A.2), and we have shown 
that under model ER, (V — K ER ^y)/E> er(p)V is asymptotically normal (see Lemma [I] in 
Appendix |A.2[ ) 





Remark 2 The ER(p ) model corresponds to HER( p°) where the matrix p° has all en¬ 
tries equal to p. In this case, the test statistic W p o can be viewed as the theoretical version 
of the empirical variance statistic V studied in Section \2.1\ as 


1 


W p o = - (A - (n - 1 )pY 


soW p o-V = 


Because asp is an average overO{n 2 ) edges, we have that ifp—p) 2 = Op(n ~ 2 
(n — l) 2 (p — p) 2 = Op{ 1). Combined with arguments similar to these of Corollary [7] and 
Lemma ^ this implies that, under the ER model, the tests based on V and W p o are 
asymptotically equivalent. 


2.2.1 Illustration 

We illustrate the use of the proposed test on the following series of networks. 


Ecological networks: this consists in two ecological networks first introduced in (Vacher 


et al. 

2008 

) and further studied in ( 

Mariadassou et al. , 

2010) 


Each of these 

networks describe the interaction between a series of n = 51 trees and n = 154 
fungi, respectively. In the tree network, two trees interact if they share at least one 
common fungal parasite. As for the fungal network, two fungi are linked if they are 
hosted by at least one common tree species. 

Political blogs network: this consists in a set of n = 196 French political blogs already 

20ll|. Two blogs are connected if one contains an 


studied in (Latouche et al. 


hyperlink to the other. 

Karate network: it describes the friendships between a subset of n = 34 members of a 
karate club at a university in the US, observed from 1970 to 1972 and was originally 


studied by Zachary (1977). 


Faux Dixon High network: this network characterizes the (directed) friendship be¬ 
tween n = 248 students. It results from a simulation based upon an exponential 


from the AdHealth Study, Wave I (Resnick et al. , 1997). 


random graph model fit (Handcock et al. 2008) to data from one school community 


CKM: this data set was created by Burt (1987) from the data originally collected by 


Coleman et al. (1966). The network we considered characterizes the friendship 


relationships among n = 219 physicians, each physician being asked to name three 
friends. 

AdHealth 67: this data set is related to the Faux Dixon network described previously. 
However, it was constructed from the original data of the AdHealth study, and not 
simulated from any random graph model. The AdHealth study was conducted using 
in-school questionnaires, from 1994 to 1995. Students were asked to designate their 
friends and to answer to a series of questions. Results were collected in schools 
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from 84 communities. In our study, we considered a network associated to school 
community 67 which characterizes the undirected friendship relationships between 
n = 530 students. 


Several covariates on nodes are available for each network, their descriptions are given in 


(Latouche et al. 

2016 

). The use of these covariates to construct ones on edges is also 

explained in ( 

Latouche et al. 

2016 

)• 


We fist applied the degree variance test to each of these networks to check if their topology 
is similar to the one of an ER network. As expected, their topology are far too heteroge¬ 
neous to fit an ER(p ) model, and the null hypothesis is rejected for each one of them. 
The question is then to know if the available covariates on edges are sufficient to explain 
the heterogeneity of the network, at least in terms of degrees. To address this question, for 
each network separately, we fitted a logistic regression model stating that logit(p°-) = xJj/3, 
where logit (w) = log(w)/log(l — u),u G M, x. t j G W 1 stands for the vector of covariates 
for the (i,j) and f3 for the vector of regression coefficients. This regression model pro¬ 
vided us with an estimate of the connection probability matrix p°. We then applied the 
degree mean square test to check if the considered covariates are sufficient to explain the 
heterogeneity of the network. 


Network 

n 

mean(p?-) 

st-dev(pE) 

WpO 

^HER( pO)W {50 

SfTER(p°)Wp° 

TestStat 

Trees 

51 

0.553 

0.2 

140.23 

10.66 

2.11 

61.55 

Fungis 

154 

0.226 

0.021 

592.12 

26.82 

3.06 

184.55 

Blogs 

196 

0.075 

0.112 

84.82 

11.05 

1.2 

61.5 

Karate 

34 

0.135 

0.149 

3.84 

3.22 

0.88 

0.71 

Faux Dixon 248 

0.02 

0.037 

11.34 

4.41 

0.43 

16.05 

CKM 

219 

0.015 

0.035 

3.16 

3 

0.32 

0.5 

AdHealth 

530 

0.007 

0.008 

8.77 

3.43 

0.24 

22.27 

Table 1: 

Degree 

mean square 

HER test. 

TestStat 

— (WpO — ^HER(p 0 )) /&HER(p 0 )- 



The results are given in Table [TJ the null hypothesis Ho : Y ~ HER( p°) is rejected 
for all networks except for CKM and Karate. As for the ecological networks, these results 
are consistent with these from (Mariadassou et al. 2010), who detected a residual 


heterogeneity in the valued versions of these networks after correction for these covariates. 


2.3 Case of sparse graphs 

We discuss the validity of Theorem [2] when considering sparse graphs. Sparsity can be 
defined in two ways. Either each connection probability vanishes as n grows, or the frac¬ 
tion of non-zero connection probabilities decreases as n grows. The following Proposition 
deals with a combination of both scenarios. 
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Proposition 1 Consider the HER( p) model, when pij = p*jn~ a , a > 0, p*- G [0,1] and 
a fraction 1 — n~ b , b > 0, of p^ ’s is set to zero. The pL ’s satisfy the same assumptions. 
Then, provided that a + b < 2, the statistic W p o is asymptotically normal. 

Proof. We will show that W* 0 — KW p o is asymptotically normal then that W p o — W* 0 is a 
negligible term. The projections P^W p o involved in W* 0 — EW p o still stand for the X nu 
and a n {jj} expressed in (|3]) stand for a nu (notation of Remark [lj. Since A* = 0(n 1 ~ a ~ b ), 
we see that a n {ij} = 0(n~^ a+b >) if a + b < 1 and 0(n -1 ) if a + b > 1. Therefore, we 
have VP{ij}V = O (n~ 3a ~ 2b ) if a + b < 1 and O (n _a ~ 2 ) if a + b > 1. Combining this 
with the number of non-zero terms which equals 0(n 2 ~ b ), we get that B 2 = O (n 2_3 (° +fe )) 
if a + b < 1 and O (n _ ( a+fc) ) if a + b > 1. Comparing A 2 (e) with B 2 , we see that the 
Lindeberg condition is fulfilled for a + b < 2. 

Now we consider W p o — Wp 0 as the sum of the projections P{ij,ik}W p o. The a n { tht k} given 
in Q equal 0{n~ l ), thus V P{ij,ik}W p o = O ( n~ 2a ~ 2 ). Since the number of non-zero terms 
in the sum is 0(n 3 ~ 2b ), we have therefore ¥ ^W p o — W*^j = 0(n 1_2 ^ a+ ^). 

We conclude to the asymptotic normality of W p o by combining the one of W* 0 — EW p o 

under condition a + b < 2 and the fact that V ^I¥ p o — W* 0 j /VWp 0 —>■ 0 as n —> oo under 
the same condition. ■ 

We now extend Corollary [l] for the degree variance to sparse graphs, considering a setting 
similar to this of Proposition [lj 

Corollary 4 Consider the HER( p) model, with exactly the same conditions as in Propo¬ 
sition [7j Then, provided that a + b < 2, the V statistic is asymptotically normal. 

The proof follows the line of this of Proposition [l] and is given in Appendix |A.3[ 


3 Exchangeable random graph models 


We consider EG($>) a generic model for exchangeable random graphs based on a graphon 
function $ : [0, l] 2 t —y [0,1] and commonly called the fp-graph Lovasz and Szegedy (2006); 

). Under EG(&), a coordinate U t ~ U[ 0,1] is associated with 
each node i(l < i < n) and edges are drawn independently conditional the Uf s as 


Diaconis and Janson (2008 


Lovasz and Szegedy (2006 


YijlUuUj ~fi[$(C/i, Uf)}. 


Note that this model generalizes the HER( p) model considered in Section 2 since we 
retrieve the latter when the latent variables are fixed, meaning T (iq, u,j) = p^j. 

Several popular graph models fall within the framework of exchangeable random graphs 
models. We only mention two. 


Stochastic Block Model (SBM). SBM ( |Holland and Leinhardt [ 1979 Nowicki and 


Snijders , 2001) consists in a mixture model for random graph (|Daudin et al. 2008) 
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in which a discrete variable Z* G {1 ,K} is associated with each node and edges 
are drawn conditionally as Y t j \ Z t , Z :j ~ B\kz 1 ,z\i where [n ke\k,e stands for the so- 
called connectivity matrix. Indeed, SBM corresponds to a bU-graph with block-wise 


constant graphon function (Latouche and Robin , 2016). 


Expected degree distribution (EDD). The EDD model is an exchangeable version 


of the expected degree sequence model studied in Chung and Lu (2002) and of 


the configuration model from Newman (2003). Under these two models, the de¬ 


gree of each node is fixed which makes them non exchangeable. Under the EDD, an 
expected degree K{ (not necessarily integer) is first drawn independently and identi¬ 
cally for each node from some distribution G and the edges are drawn independently 
conditional on the K, as Y tJ \ K t , Kj ~ B[KiKj/ k], so E(Dj|/U) cx K t . EDD corre¬ 
sponds to a bU-graph with product-form graphon function: <f>(w,u) = g(u)g(v)j 


taking g(u) = G 1 [u)/\/ k. Young and Scheinerman (2007) consider a specific case 
of this model. 

The asymptotic framework for the EG(Q) model mimics this of HER( p) described in 
Section 2, with ( I> tJ := $(Ui,Uj) replacing . 


3.1 Asymptotic normality 

We propose a goodness-of-fit test for the bU-graph model. For a given graphon <h°, we 
consider the following degree mean square statistic. 

W $ o = -V(A-(n-l)0?) 2 , 
n 


where <ft{ stands for the marginal probability for any given edge to exist, namely <j)\ = 
f f v)dudv. We establish the asymptotic normality of bb$o under model EG($). 


The proof relies on a central limit theorem for acyclic patterns from (Bickel et al. 
which is recalled hereafter. 


2011 ), 


Considering an undirected graph Q = (V, £), with V = {1,.. .n}, generated by <3>, let R 
be a subset of {(i, j) : 1 < i < j < n} C £. The graph associated to R , which actually is 
a pattern of Q, is denoted by Qr = (Vr,£r), with = V fl R. Then for a given i?, we 
define P(R) and its empirical version P(R) as follows. 


P(R) = P {£r = R}, and P(R) = C n ) N(R)~ 1 ^ 1 (G s ~ Qr) , (5) 

W' g s cg 

where ~ stands for the isomorphic relation, N(R) is the number of graphs isomorphic to 
R and p = |Vr|. 
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Theorem 3 (Bickel et al.) Consider a set of fixed patterns {Rfi,..., Rk} with \ V{Rf) \ < 
p and J J ($(m, v)/4>i) 2 ^ £r ^ dudv < oo ('$ = <f>(n) and fii = fifin)). Suppose that (n — l)0i 
is of order n 1_2//p or higher. Then, 

{(P(Ri), ■ ■ ■, P(Rk)) - (EP(i?i),..., EP(i4))) A JV(0, VR), 
where P{Rj) = fix ' and fii/fii —1. 

Theorem 4 Under model EG(&), the statistic W<& o zs asymptotically normal : 

(IF^o — E$VF$o)/§$PF$o —> A/”(0,1), 

with moments 


E$W$o = n 1 (n(n - 1) 2 (0?) 2 + [1 - 2(?z - l^jzzyi + n 2 0 2 } , 

V*W*o = n~ 2 |4[1 - 2 (n - 1)0?] 2 ^y0i + n 2 0 2 + y0? - y 0^ 

+8[1 — 2(n — 1)0?] — (20 2 + 03 ) + ~~ (05 + 20 6 ) + — 010 2 — 
+4 


> 1'V2 


— (30 2 + 603) + —(404 + 205 + 206 + 07) 

Lb 2 

+^(408 + 09 + 4040 )+^-^ 


v 5 4 ^ 

where n t = ni=o( n — an d 0z denotes the probability P of pattern Ri given in Fzg?zre[l] 
as defined in Bickel et al. (2011): 0* = P{Rf). 


R 


i 







Figure 1: Definition of the patterns R\ to i? 10 involved in the calculation of the moment 
of the W statistics. 


Proof. The proof relies on the fact that the statistic W$ o is a linear combination of the 
P(Rj) of three particular patterns Rj to which we will apply Theorem [ 3 } Let us begin 
with the calculation of the moments of VF$ 0 . First observe that, 

^[ A -(„_ l )^] 2 = „{n _ 1) 2 (^.?) 2 + 2[1 - 2 (n - 1 )$] Yl Y H 

i i<j 

+2 YijYik + YjiYjk + YkfYkj 

l<i<j<k<n 

= n{n - 1) 2 (0?) 2 + 2[1 - 2 (n - 1)0?]]^ + 2M 2 , 
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where 


Mi— Yij, M 2 — YijYik + YijYjk + YikYjk. 

1 <i<j<n 1 <i<j<k<n 

Then, we see that, 

EMi = y-0i and EM 2 = y0 2 , 

which gives E^VP^o. 

Next, we calculate the three forthcoming expectations (calculation details are given in 
Appendix |A.4[ ): 

E (Mi) = — 0i + n 2 0 2 + ^3(0 i)N 
E (M 1 M 2 ) = y(20 2 + 0 3 ) + y(0 5 + 20 6 ) + ^0 1 0 2 

E(Mf) = y (30 2 + 60 3 ) + y (404 + 205 + 206 + 07) + y (40g + 40 lo + 09) + y0 2 , 
which give V$W$o. 

We now turn to the asymptotic normality of W$ o. By using definition ([5]) of P and 
the one of P given in Theorem we observe that, 


nW&> = Di - (n * l)0i] 2 


= n [n 


- 1) 2 (0?) 2 + [1 - 2(n - 1)0?] ^Ip + ^ YM 1 - y jjt ) + ^ Ipr,,!},, 


n(n - 1 ) 2 (0?) 2 + [1 - 2(n - l)0?]n 1 P(P 1 ) + -n 2 P(P 2 ) + n 2 P(P 3 ) 


1 


= n(n - 1) (0?) + [1 - 2(n - l)0?]ni0iP(Pi) + -n 2 0i P(P 2 ) + n 2 0i P(P 3 ), (6) 

where Pi, P 2 and P 3 are depicted in Figure [l] Thus we obtain the following linear 
combination of P(Pi), P(P 2 ) and P(P 3 ) : 


n" 3/2 (PP> - (n - 1) 2 (0?) 2 ) = 0(n 1/2 ) x 0iP(P0 + O^ 1 / 2 ) x 0i 2 P(P 2 ) 

+0(n 1//2 ) x 0! 3 P(P 3 ). (7) 

Since 0i —> p 0i by Theorem [ 3 J we conclude by applying the asymptotic normality result 
of the same theorem to the right-hand side of Equation ([?]) combined with the Slutsky’s 
lemma. Note that condition f J ($(m, v) /(/>i) 2 ^ £r ^ dwdn < 00 , j = 2,3 is fulhlled because 
<f> < 1 and 0i are constants. ■ 


Remark 3 The test statistics W p 0 in the independent case and W$o in the exchangeable 
case measure both the discrepancy between the observed degrees and their expected values 
under specifics models. Let stress that the latent layer in the exchangeable case implies an 
additional variability of the degrees. The third term in Equation ([7]) is a consequence of 
this additional variability. 
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3.1.1 Particular cases: SBM and EDD 


Because SBM and EDD are special cases of the IF-graph, all results above apply to 
them. Interestingly, for both models, the critical calculation of coefficients 0i to 0 1O can 
be achieved exactly. Indeed, the calculation of the first two moments of pattern counts 
under SBM and EDD is explicitly addressed in Picard et al. (2008). In this reference, it 


is already observed that patterns 4 to 10 from Figure [l] need to be considered as ’super- 
patterns’ (or ’super-motifs’) of patterns 2 and 3 and that the variance of the count of a 
given pattern depends on the expected frequency of its super-patterns. 

The formula of <pj for SBM is explicitly in Picard et al. (2008). Denoting a.k the 


probability for any given node to belong to group k (1 < k < iC), we have that 


K 


K 


-p(r,) = Y.'"Y. 


a/ci • • • ^ k v 


ki 


n 

1 <U<V<Pj 


7r 


h k 

"'ll "'V 


where pj stands for number of nodes in pattern Rj and m J uv is 1 if nodes u and v are 
connected in pattern Rj and 0 otherwise. 


The EDD model is also studied in Picard et al. (2008) but needs to be adapted to the 


VF-graph framework. For 0(ii,u) = g(u)g(v ), we have that 


Pj 


n 


U=1 


where gk = g k (u)du 

Jo 


and df stands for the degree of node u within the pattern Rj. Some examples are 


0i = gf, 


^2 — 9l92, 03 — 921 04 — 919293 , 010 — 9l9293- 


3.2 Test and power 

We now study the test of H 0 = EG(&°) versus Hi = EG($>). The next Corollaries provide 
the null distribution of the test statistic W^o and the power of the associate test. They 
are direct consequences of Theorem [4j 


Corollary 5 Under the model based on $ 0 the statistic W$o is asymptotically normal 
with moments expressed as those of Theorem^ with all 0j replaced by 0°. 


Recall that the particular terms Sij = ptj — p F appear in the moments of W p o under 
model HER{ p) whereas it is not the case anymore under HER( p°) (see Theorem [2] and 
Corollary [2] in sections 2.1 and 2.2). Notice that this simple measure of discrepancy 


between two alternative models is not visible in the moments of W$o but is? spread out 
all differences between 0, and 0°. 


A formal test with asymptotic level a can be constructed based on Corollary [5j which 
reject H 0 as soon as VF$0 exceeds E$olF$o +i Q ,§$oW$o. The expression of its power follows. 
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Corollary 6 The asymptotic power of the considered test is 


7r(p) = 1 — $ ((E$oW$o + i Q §$oWc|>o — E<j>lCj,o) /S$W$o). 


Remark 4 Let consider the test of H 0 = ER versus Hi = EG(<&). This simply corre¬ 


sponds to the degree variance test based on the statistic V described in Section 2.2 


3.2.1 Illustration 

As an illustration of the proposed test, we consider the networks described in Section 

2.2.1 The question is to know if a fitted graphon is sufficient to explain the heterogeneity 
of a network, at least in terms of degrees. To address this question, for each network 
separately, we estimated a graphon function using the variational expectation maximiza¬ 


tion of Daudin et al. (2008) to provide estimates of the SBM model parameters and 


build the corresponding block-wise constant graphon function. The number of blocks was 
estimated using the model selection criterion considered in Daudin et al. (2008). This 


is implemented in the package mixer (available on the https: //cran. r-pro j ect. org/). 
We then calculated the moments of the graphon and applied the degree mean square test 
to check if the fitted graphon is sufficient to explain the heterogeneity of the network. 
The results are given in Table [2j 


Network 

n 

density 

K 

Wjo 

®bg(8°) 

TestStat 

Tree 

51 

0.54 

5 

163.14 

162.84 

17.31 

0.02 

Fungi 

154 

0.227 

15 

597.6 

584.42 

116.63 

0.11 

Blog 

196 

0.075 

11 

104.72 

92.77 

25.89 

0.46 

Karate 

34 

0.139 

4 

14.6 

15.57 

6.16 

-0.16 

FauxDixon 

248 

0.02 

5 

16.78 

11.97 

1.94 

2.48 

CKM 

219 

0.015 

3 

3.9 

4.04 

0.76 

-0.18 

AdHealth 

530 

0.007 

4 

10.7 

7.54 

1.42 

2.22 

Table 2: Degree mean 
(h'go — ®£g($ 0 ))/^eg($ 0 )- 

square 

EG 

test for 

an SBM-graphon. 

TestStat 


Using the normal approximation for the distribution of under H 0 , the EG(Q°) 
model is rejected for two of these networks: FauxDixon and AdHealth. The highest test 
statistic is observed for the FauxDixon network, which has actually been simulated under 
a model that does not belong to the class of EG($). 

3.3 Case of sparse graphs 

The following theorem discusses the validity of Theorem [4] when considering sparse graphs, 
namely when <f>i = <f>\{n ) vanishes as n grows with a rate we specify. 
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Proposition 2 Under the model based on the graphon $ such that 0 3 and 0° are of order 
n“ 2/3 or higher, if ff (<E }(u,v)/<f>i(n)) 6 dudv < oo then the statistic W<& o is asymptotically 
normal. 

Proof. We apply Theorem [3 to a function of W $>0 which is a linear combination of P(R± ), 
P(P 2 ) and P(P 3 ) involving the quantity 0i/0i, where Pi, P 2 and P 3 refer to the patterns 
from Figure [Tj Equations ([6])-(j7]) state that 

n" 3/2 (W*o - (n - 1) 2 (0?) 2 ) = 0(0500 x Ofa 1 ' 2 ) x —P(Ri) 

0i 

+OM) x 0(n^ 2 ) x f |lj P( fl2 ) 

+0(0?) x 0(n'-' 2 ) x j p(fi 3 ). 


The asymptotic normality of \Jn ^P(Pi), P(P 2 ), P(P 3 ) j holds under conditions : 

f f (&(u, v)/4>\) 2 ^ £r ^ dwdu < 00 with |£#. | < 3 and 0i being of order n~ 2 / p or higher with 
p — 3. Now, we observe that under the condition that 0i and 0? are of order n~ a for 
0 < a < 2/3, 

n -3/2+2a , w * o _ _ 1) 2 (0O) 2 ) = 0( n V 2 ) x ^P(R 1 ) + 0( n 1/2 ) 

01 

+0(n 1 / 2 "“) X ^ j P(P 3 ). 

Since 0i/0i —1 by Theorem [ 3 J we conclude by applying the asymptotic normality 
result of the same theorem to the right-hand side of the equality above combined with 
the Slutsky’s lemma. Note that the third term mentioned in Remark [3] is negligible. ■ 



4 Simulation study 

We designed a simulation study to assess the performance of the tests described above. 
More specifically, our purpose is to evaluate the power of these tests for various graph 
sizes and densities (mean connectivities). We also aim at illustrating for which graph size 
the asymptotic normal approximation is accurate; we especially focus on this point in the 
sparse regime. 
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4.1 Simulation design 

Design for the independent case. We designed our simulation so that to mimic the 
situation where an heterogeneous model HER( p°) is considered, which still misses some 
heterogeneity. More specifically, each node i was associated with a vector of covariates 
Xi € (all values were drawn i.i.d. with standard Gaussian distribution and d was set 
to 3). Each edge (i , j ) was then associated with the covariate vector Xij = y/ir\xi — Xj |/2 
so that all x l3 are positive with mean 1. The edges were then drawn according a logistic 
model: \og\t{'p l3 ) = a + xjj(3i where (3\ = [/ip (3} 1 G R d , (3q G M d_1 . The constant a was set 
to preserve the mean connectivity, denoted p* in the sequel. 

The probability matrix p° = [p°-] of the null model was defined according to the same 
logistic model, removing the last covariate, namely logit (p?) = ao + x° T /?o, where x ( - 3 is 
x^ deprived from its last coordinate. Hence, the discrepancy between the null hypothesis 
and the true model is measured by the coefficient /3 of the last covariate. All /3 0 ’s were 
set to 1 except /? which ranged from 0 to 2. 


Design for the exchangeable case. We designed a situation where a null block-wise 
constant graphon $°, associated to a SBM model, is contaminated by an alternative 
graphon of the form considered in Latouche and Robin (2016). Thus, graphs were sam¬ 
pled from an EG(<&) model where < h(u,n) = $°(u, v)p^ 2 vP~ 1 v^~ 1 . Note that $ induces 
a random graph model related to the degree corrected SBM model of Karrer and New- 


man 


(2011) which has received strong attention in the last five years. This model, by 
characterizing explicitly the degrees of the vertices, is often employed as an alternative 
to the standard SBM model. Note however that in its original form the degree corrected 
SBM model is not exchangeable since the degree parameters are fixed. Conversely, $ 
induces an exchangeable model here since the degree terms u^ _1 and v^ 1 are random. 
For the null graphon < f>°, we considered a SBM with 2 blocks, with the same proportions. 
Moreover, $° was given a product form such that <E>°(w, v) = rjkVe if u and v are in blocks 
k and £, respectively. We set rg = 0.4 and rj 2 = 0.5. In this simulation framework, the 
discrepancy between the null hypothesis and the true model is measured by the term (3 
which ranges from 1 to 2 and controls the imbalance of the expected degrees of the nodes. 
The null graphon is retrieved when (3=1. Finally, the term p was set in order to obtain 
the desired mean connectivity p*. 


Note that is both designs, the density of the network is kept constant equal to p* when 
going away from the null model. Therefore, the departure from H 0 detected by the tests 
is not due to a mean degree difference. In both designs, (3 measure the departure from 
the null model, although the its nominal values are not comparable from one design to 
another. 1000 simulations were ran for each combination of the parameters (■ n,p*,(3 ). 


Sparse graphs. For both tests, we considered sparse graphs in the setting described in 
Sections H and |3.3[ We focused on the asymptotic normality of the degree mean square 
statistic under the null hypothesis. To this aim, we designed a reference null probability 
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matrix p°* and a reference null graphon as described above. We then considered the two 
sparsity scenarios: 

• vanishing connection probabilities: p? = p^*7i~ a and $(«, v) = n _a< h°*('U, v)] 

• sparse connection probabilities: = p ( ]* with probability n~ b and 0 otherwise. 

The second scenario does not make sense for the EG(<&) test. The mean connectivity 
p* was set to 0.1. The density of the graphs therefore decrease as p*n~ a and p*n~ b , 
respectively. 

Criteria. For each parameter configuration, we computed the moments of the respective 
statistics and derived the theoretical power. Based on the replicates, we estimated the 
empirical power. For the sparse setting, the proximity with the normal distribution was 
investigated plotting the empirical quantiles versus the theoretical Gaussian quantiles 
(QQ-plots). 

4.2 Results 

Power and asymptotic normality. The power curves of the degree mean square tests 
in the independent and exchangeable cases are given in Figures [2] and [3j respectively. 
As expected, the power increases with both the departure /?, the graph size n and the 
network density p*. We remind that the departure parameter f3 can not be compared 
between the two figures. The binomial confidence interval around the theoretical power 
informs us about the convergence to the asymptotic normality. We observe that the 
empirical power (dots) falls within this interval showing that the normal approximation 
is accurate for reasonably large (n > 100) graphs. This accuracy also depends on the 
density of the graph; it is satisfying for p* > 1% in the independent case and p* > 3% in 
the exchangeable case. 

Sparse graphs. Figures [4] and [5] display the QQ-plots of the standardized W p o and 
Wi j>o statistics under the vanishing probabilities scenario for graphs with several sizes. 
Remember that the larger the power a, the sparser the graph. We observe again that 
normality holds for the non sparse graphs (a = 0) even for n = 100, but the departure is 
visible for n = 100 as soon as a > 0.4. The same is observed for n = 1000, although a bit 
later (a > 0.8). For the largest graph (n = 10 000), normality holds until a ~ 1.2 — 1.4 
but does not seem to be reached for higher sparsity regimes. As expected, in the very 
sparse regime, normality can only be relied on for very large graphs. Similar conclusions 
can be drawn for the sparse probabilities scenario, each distribution being slightly closer 
to normal. 
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Figure 2: Power of the degree mean square test in the HER design, as a function of /3 the 
effect of the last covariate. Top left (log 10 p* = —2.5), top right (log 10 p* = —2), bottom 
left (log 10 p* = —1.5), bottom right (log 10 p* = —1). Color refers to the graph size: n = 32 
(red), 100 (green), 316 (blue), 1 000 (cyan) (green, blue and cyan curves and points overlap 
in the last panels). Points = empirical power (average on 1000 simulations): dotted points 
= W p o test, solid line = theoretical power; dotted line = binomial confidence interval for 
1000 simulations. 


A Appendix 

A.l Proof of Corollary [T] 

Let express V as follows. 

n 2 v = )E(A-Ci ) 2 
= 2 (n — 2) Yij 

l<£<j<n 

+2(n — 4) ^ {)),)), ■ I),!}, • )),)),} 

l<i<j<k<n 

-8 ^ {Y ij Y ke + Y ik Y jt + Y u Y jk }. (8) 

l<i<j <k<l<n 
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Figure 3: Power of the degree mean square test in the EG design, as a function of /3 which 
controls the degree imbalance. Same legend as in Figure [2] 


Then we write the Hoeffding decomposition of V : 

} = PqV + ^ P{ij}V + ^2 + P{ij,jk}V + P{ik,kj}V] 

1 <i<j<n 1 <i<j<k<n 

+ ^2 {P{ij,kl}V + P{ik,jl}V + P{il,jk}V} ■ (9) 

l<i<j <k<l<n 

Taking all projections with respect to HER( p), we have 

P%V = ^ ( 4 (n - 2) ^2 Pa + 4 (™ - 4 ) 5^ (PuPifc + PijPjk + PikPjk} ) 

\ 1 <i<j<n 1 <i<j<k<n / 


n* 


^ ^ {PijPki T PikPjl T PifPjk } i 


l<i<j <k<l<n 
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Figure 4: QQ-plots of the degree mean square statistics W p o in the HER design, for 
vanishing connection probabilities: p 7iJ = p*jTt~ a and initial mean density p* = 0.1. From 
top left to bottom right: a = 0, 0.4, 0.8,1.2,1.4,1.6. Graph size n = 100 (+), 1000 (x) 
and 10 000 (o). 


which gives the expectation. The other projections provide the variance. We have, 


P{ij}V = ^2 Y *3 I 4 ( n _ 2 ) + 4 ( n - 4 ) ( Pik + Pok) - 16 ^ Pki ) ’ 


P{ij,ik) P 


2 n 2 

2 (n - 4) 


n* 


v v'■ 

1 ij 1 ik ? 


and 




r 


So, 


( 10 ) 

( 11 ) 


n 


n 


‘V P {ij} V = a\j I 2(ra - 2) + 2(n - 4) ^ (pi, fc + p jyk ) - 8 ^ 

\ k£(i,j) k<l£(i,j) 

yP{ij,ik} V = 4 ( n - 4 ) 2 4^i and n 4 VP {ijifcZ} H = 640-Jcr^, 


Pkt I , 

i / 

(12) 

/ 

2 

u Ui 

(13) 


and the variance of V follows by summing over all indexes. 

As for the asymptotic normality, we consider V — EG = V* — EH + V — V*, with V* = 
PqV + Ei<i<j<n P{P}P■ I 11 or der to show that that V* — EH is asymptotically normal, 
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Figure 5: QQ-plots of the degree mean square statistics <> in the EG design, for van¬ 
ishing connection probabilities: <E>(w,u) = n~ a Q°*(u, v) and initial mean density p* = 0.1. 
Same legend as in Figure |4j 


we apply Theorem |l] to the projections Puj\W p o (which stand for the X nu ) by using 
Remark 0 The ci n {ij} = 0(1) expressed in ( [lO] ) stand for a nu . Since B 2 = V (V* — EG) = 
0(n 2 ), we conclude that the Lindeberg condition is fulhlled because, for any e, each a nu 
becomes smaller than eB n when n goes to infinity. Now we consider V — V* as the linear 


combination of the projections P{ij,ik}V and P\i 3 .ki}V. We notice that a n {ij }i k} and a n ^j } ki} 
given in © equal 0{n x ) and 0(n 2 ) respectively, and thus that V (V — V*) = 0(n). 
We conclude to the asymptotic normality of V by combining the one of V* — KV and the 
fact that V (V — V*) /W* —> 0 as n —> oo. 


A.2 Degree variance test power 

Lemma 1 Under model ER, the degree variance is asymptotically normal: 

(V - E ER ®V) /S ER{P) V A AT(0,1). 


Proof. The proof relics on the concentration of p around p and on Slutsky’s lemma (see, 


e.g., Theorem 4.4, p.27 in Billingsley Billingsley (1968)). First, write the statistic based 
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on V as 


V - E P V §„¥ ( V - E P V E P V — E P V \ 

~ SpV \ S P V + SpV ) ' 

Then note that, under ER(p), (p — p) = Op(n _1 ), so ( pq — pq ) = Op(n -2 ), where q stands 
for 1 — p. According to the moments given in Corollary ??, we have that E p V = 0{n)pq 
and V p V = 0(l)pq+0(n)p 2 q 2 . This entails that E P V — E P V = Op(n _1 ) and V^ld— V P V = 
Op(n -2 ), so SpC/SpC converges in probability to 1 and (E p ¥ — E p V )/§ p ¥ converges in 
probability to 0. The result then follows from Slutsky’s lemma, used twice. ■ 


Lemma 2 We have 


(V ~ Eer( p )V ) /S>er(p)V — (V — E er(p)V) /§ er(p)V —» 0, 
where p = [n(n - 1 )] -1 Z)i#Pu- 

The proof of this Lemma is similar to this of Lemma [l] and results from the concentration 
of p around p. 

A.3 Proof of Corollary [4] 

The proof follows the line of this of Proposition [T[ We begin with the asymptotic normality 
of V* - EV. Since = 0(n 1 ~ a ~ b ) and = 0(n 2 ~ a ~ b ), we see that 

OnUj} = 0(n _(a+6) ) if a + b < 1 and O^” 1 ) if a + b > 1 are given in assertion 

p|). Therefore, we have V P{ij}V = O (' n ~ 3a ~ 2b ) if a + b < 1 and O (n~ a ~ 2 ) if a + b > 1. 
Combining this with the number of non-zero terms which is 0(n 2 ~ b ), we get that B 2 = 
O (n 2 ~ 3 (“ +6 )) if a + b < 1 and O (?r“( a+ ^) if a + b > 1. Comparing A 2 (e) with L> 2 , we see 
that the Lindeberg condition is fulfilled for a + b < 2. 

Now we consider V — V* as the linear combination of the projections Puj^yV and P^j^nV. 
We see that a n{ijpk} = 0(n _1 ) and = 0(n ~ 2 ) {a n{ij , ik} and a n{ijM} are given in 

assertion (jll|). Therefore, we have YP^j^yV = O (n~ 2a ~ 2 ) and NP^^V = O (?ir 2a-4 ). 
Since the number of non-zero terms in the sums is 0(n 3 ~ 2b ) and 0(n 4 ~ 2b ) respectively, we 

have therefore V {w p o — W* 0 j = 0(n~ 2 ( a+b ^ +1 ). 

We conclude to the asymptotic normality of V by combining the one of V* — EC under 
condition a + b < 2 and the fact that V (V — V*) /NV* —>■ 0 as n —> oo. 
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A.4 Moments of W$ o in the proof of Theorem [4] 

We have 


E(Mf) = 


Tj'Tfc ~h YjiYjk d~ YfaYkj 


and 

E(MiM 2 ) 


E (E F « 2 )+2e[ v 

\i<j / \1 <i<j<k<n 

+2E | YijYki + YikYjt + YtfYjk j 

\l<i<?’</c<Z<n / 


Til / / 1 / , \9 

— 01 +n 2 02 + Tn 3 (0i) , 


E 


( E 


Yij 2 Yik + YifYjk + YijYkiYkj 


\l <i<j<k<n 


1 , 1,2 


E 


( E 


YijYikYu + YijYkjYhn + YijYpjY^k 


E 


71 


( E 


YijjYjzlYkrri + YijYiJ^Yi m + YijY m j £ Yi rn 




\l <i<j<k<l<m<n 


2,1,7i — 3 


(202 + 03) + 


n 


l,l,2,n-4 


(05 + 206) + 


n 


2, 3, n - 5 


(3010 2 ) 


= y(202 + 0 3 ) + y(05 + 20 6 ) + ^0102 
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and 


E(Mf) = 


ik 


Y E [YifYik 2 + Y/Yjk 2 + Y H 2 Y kj 

l<i<j<k<n 

+2 {Yij 2 YikYjk + YijYik 2 Yjk + YijYi k Ykj 2 ) 

(2,1,1) Y E (YijYikYjkY^ + Y tJ Y lk Y k] Y ki + Y tj Y ik Y ej Y, 

+ YjiYj k 2 Y k £ + yjYjkYijYik 
+Y k iY k j 2 Yj£ + Y k iY k j 2 Y k £ + YkYkjYtjY^ 

f ^ 2 2) ^ ^ E + YijYuYllYlm + YijYi k Y mk Yt 

YjjYjkYklYkni “ 1 “ YjjYjkYgkYg m “ 1 “ ji^jk^tk^lm 

-^YkiYkjYkiYkm YkjYkjY^kYftvyi + YkjYkjYmi^ftfyi 

(!, 

T Y/j Yj k Y km Y kl , T Yjj Yj k Yp rn Y rn u T Yjj Yj k Yp y V n 
YYj k Yj k Y (rn Y k u T Yj k Yj k Y km Y mu T Y^kYjyAg u Y r 


^ ' E ( YijYi]YimY(. u YY.ijYi k Y^ m Ymu T YijYijYtuYm 

l<i<j <k<l<m<u<n 


n 


(302 + 60 3 ) + 


n 


n 


l,2,2,n-5 


2,1,1, n — 4 

(40 8 + 40io + 09) + 

n 3, 


(404 + 205 + 206 + 07) 

(9 01) 


n 

3,3, ?7 — 6 


— “TT" (302 + 60 3 ) + —(404 + 205 + 206 + 07) + —(408 + 40io + 09) + — 02- 
6 2 4 4 
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